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Abstract 

One of the key points addressed by Per Bak in his models of brain function was 
that biological neural systems must be able not just to learn, but also to adapt — to 
quickly change their behaviour in response to a changing environment. I discuss 
this in the context of various simple learning rules and adaptive problems, centred 
around the Chialvo-Bak 'minibrain' model [Neurosci. 90 (1999) 1137-1148]. 
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1 Introduction 



When attempting to model biological learning, what factors should we take 
into account, and what sort of problems should we expect our models to solve? 
One of the things which Per Bak always emphasized was that it was not enough 
simply to learn one task fast: biological neural dynamics had to be able to 
adapt, to wnlearn patterns of behaviour that were no longer working and find 
new ones. For example, the important early work on 'reinforcement learning' 
by Barto and colleagues [l|, which produced much more biologically 
plausible learning rules than those previously considered, still foundered on 
this problem, with networks having to be completely reset in order to learn a 
new problem. 

Particular progress in this regard was made by the work of Dmitris Stassinopou- 
los, in collaboration with Preben Alstr0m [4| and Per himself |^]. However, it 
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was a few years later that an especially elegant model was developed by Per 
in collaboration with Dante Chialvo p. This model, which I rather cheekily 
dubbed the 'minibrain' ^ 0| , addressed the problem of adaptation by assuming 
that learning was by only long-term synaptic depression (LTD), the weaken- 
ing of connections. Synapses involved in bad decisions were suppressed, but 
only enough to render them inactive. Meanwhile, because synapses were not 
strengthened or reinforced in any way — there was a complete absence of long- 
term potentiation (LTP) — the strengths of active connections remained barely 
greater than those of the inactive ones. Thus, the network could easily switch 
to using a different set of connections if the need arose. It was suggested that it 
was the absence of any strengthening of connections in the model that was the 
key to its adaptive ability. In the present work I illustrate this by investigating 
the adaptive ability of the minibrain when simple forms of LTP are included, 
compared to the original model and the 'selective punishment' extension later 
proposed 



2 The model 



For simplicity I consider here only the basic feedforward minibrain: 3 layers 
of neurons ('input', 'intermediary' and 'output') of size nip, and nop re- 
spectively. Each neuron in the input layer has a one-way connection to every 
neuron in the intermediary layer, and similarly each intermediary neuron has 
a connection to every output. Each connection is assigned a strength value, 
initially evenly distributed in the interval [0, W] (here W = 1). Activity prop- 
agates according to extremal dynamics: if we stimulate a neuron, the signal 
travels along the single strongest outgoing connection, and the neuron at the 
end of that connection then fires, and so on until an output neuron fires. 

Should this output be incorrect, a negative feedback signal is sent to the sys- 
tem and the connections responsible are punished by having their strengths 
reduced by a random amount in the interval [0, S] (here 6 = 1). Learning effi- 
ciency is measured by the total number 0i of such signals required for complete 
learning. Depending on a control parameter ( = 'n.im/(^ip^op), two phases of 
behaviour are identifiable : for C < 1 the network is in the disordered phase 
where complete learning is impossible and = oo. For C > 1 complete 
learning becomes possible with ^ nipUop. In the present work, n^p = 10, 
rzim = 100 and n^p = 10, with the n^^ value picked to ensure the network is in 
the ordered phase so complete learning will always be possible. 

^ Much to my embarrassment, after Per and I published our collaboration using 
this name, Per told me that he had always called it the 'Dante brain', and Dante 
called it 'Per learning'. Readers are invited to draw their own conclusions but should 
not read anything whatsoever into the title of the present paper. . . ;-) 
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Fig. 1. Adaptive ability of the minibrain model when unbounded LTP is included, 
compared to the traditional version with LTD only, (a) At each adaptation, one 
input-output association is randomly reselected ('slow change'). The inset shows 
more clearly the results for smaller v. Note the astonishing increase in the required 
value of (pi for i/ > 0: only for the smallest values, v < 0.1 = l/riip, is this bounded, 
and then performance is still significantly worse than for the LTD-only network, (b) 
The network is required to adapt successively back and forth between two different 
input-output maps (the 'flip-flop' problem), {(pi) is bounded in all cases, but nev- 
ertheless = (LTD only) provides the best performance. Data averaged over 128 
realizations. 



3 Unbounded potentiation 



The simplest manner of including LTP is symmetric witli tlie negative feed- 
back: in the event of a successful decision, the connections responsible can 
be rewarded by having their strengths increased by a random amount in the 
interval [0, z/]. How does this affect the network's adaptive ability? 

Suppose that we present a network with an input-output map to learn and, 
each time learning is completed, randomly reroU one of the input-output as- 
sociations. We can call this the 'slow change' problem. Naturally we want to 
know what value of 0i the network will require to adapt to the new maps ^ . 
We stimulate each of the inputs in turn and apply LTP or LTD as necessary; 
learning is deemed complete when we can run through all the inputs without 
error. Fig. la shows how the average {(pi) varies with successive adaptations. 
Most of the networks with z/ > very quickly become hopelessly addicted. 



^ Including LTP in this way might lead one to question whether it is still appropriate 
to use this measure of learning efficiency. In fact this is a non-issue: LTP can only 
be applied to active connections, and a connection only becomes active through 
the depression of connections stronger than itself. Therefore, even with positive 
feedback, it is still entirely appropriate to use (pi, the number of applications of 
negative feedback required for complete learning, as a measure of learning efficiency. 
LTP in this context is not so much a learning mechanism as an 'anti-forgetting' 
mechanism. 
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Fig. 2. Collapse of data in Fig. lb for the flip-flop problem and unbounded LTP, 
according to the prediction of Equation (1). 

requiring huge amounts of negative feedback to adapt to new input-output 
maps; this is escaped only with the smallest values of u > 0. Indeed, we 
can observe two distinct phases of behaviour: for u < 0.10, is bounded 
(though always worse than the u = case), while for u > 0.10 we have the 
real addiction with growing exponentially. 

We can explain this as follows. Consider one input neuron. On average, nip 
adaptations will pass before its associated output is reselected. Since it is 
receiving positive feedback all the while, the amount of potentiation given to 
its active outgoing connections will be proportional to riip. Thus, if the ratio 
u/S is greater than l/n^p, there will be a divergence between the strengths of 
the active and inactive connections, leading to the observed addiction. More 
generally, for any i/ > 0, it is possible to think of a rate of change slow enough 
that addiction will result. 

Fig. lb shows the results for a different problem, the 'fiip-fiop' problem. This 
time, the network starts by having to learn the map 1— >1,2— s>2,3^3,..., 
and, once this has been learned, the map required is switched to 1 — >• rzip, 2 — >• 
(nip — 1), 3 — *• (nip — 2), . . . ; and we continue switching back and forth between 
these two inverse input-output maps. Again, the learning process consists of 
repeatedly running through the cycle of inputs until the complete cycle can be 
run through without error. The slowness of change of the input-output map 
is no longer an issue, since all of the input-output mappings are changed at 
each adaptation. Thus, as one should expect, the values of required to 
adapt remain bounded for all values of u. Nevertheless, performance is still 
observably worse in all cases than the u = case with LTD only. 

To explain this, consider again a single input. Once it has been correctly 
wired up to its associated output, it will have a window of time (while the 
rest of the network is still learning) in which its active outgoing synapses will 
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Fig. 3. Adaptive ability when bounded LTP is included, (a) Slow change problem. 
Addiction is prevented but for Wmax > 1 = VF, adaptive ability is worse than in 
the LTD-only {y = 0) version, (b) Flip-flop problem. {<pi) increases to a maximum 
value determined by (see Fig. lb). Data averaged over 128 realizations. 

be potentiated. The average for this time window will be controlled by the 
system size, i.e. a constant /(nip, nop) for any given network, and the total 
divergence between active and inactive connections will be proportional to z/; 
thus the gain in the amount of negative feedback required to adapt will be 
given by i>/6. Mathematically speaking, we have. 



i) = nipTiop + 



which is confirmed by the data collapse achieved in Fig. 2. 



4 Bounded potentiation 



A method to avoid addiction while maintaining potentiation was proposed by 
Parisi with respect to the Hopfield neural network model. By placing an 
upper bound on synaptic strength, he was able to construct a 'memory which 
forgets' and thus avoid the state of total confusion observed if the network 
were overloaded. This can be easily applied to the minibrain model, requiring 
synaptic strengths to be bounded in the interval (— oo, VFmax], with IVmax > W . 
In this bounded case, should potentiation cause a synaptic weight Wij to go 
over the limit, we simply reset it to VFmax- 

Fig. 3a shows the performance in the slow change problem of different networks 
with V = 0.45 and varying values of W^max? as compared to a network with u = 
0. While the presence of the bound VFmax has prevented the runaway addiction 
seen in Fig. la, there is no improvement over the simple LTD-only case. In 
general, the network performs worse, and it is only with VFmax = W that the 
network matches the performance of the standard LTD-only minibrain. This 
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is natural when one thinks that inactive connections will always have strength 
of o{W) whereas active connections will have strength o(M/max)- 

By contrast a curious behaviour is observed in the 'flip-flop' problem of switch- 
ing back and forth between two different (non-overlapping) input-output maps 
(Fig. 3b). In all cases the value of increases with successive adaptations 
to a maximum value controlled, not by VFmax, but by z/ (see also Fig. lb). How 
can we explain this? Each time the input-output map switches, the system 
must suppress the active connections and then search among all connections 
for the required correct outputs. What makes this different from the slow 
change problem is that here the majority of possible input-output connections 
are always incorrect. Therefore, the majority of connections will be continu- 
ally weakened, never strengthened, and the gap between the average synapse 
strength and the maximum synapse strength PFmax will diverge. This is equiv- 
alent to continually increasing the value of VFmax, meaning that in the long 
run the system will behave as if this limit does not exist, reproducing the 
behaviour observed in the case of unbounded LTP. 



5 Selective punishment 



Finally, let us consider the case of selective punishment. Here a synapse in- 
volved in a successful decision becomes permanently marked as 'good'; should 
it later be punished, it is by an amount in the interval [0, 6*] with 6* < S. Thus, 
a previously good connection that has been suppressed is easier to reactivate 
than a connection that has never been good. 

As Fig. 4a shows, this makes no signiflcant difference to the network's ability 
to adapt in the slow change problem^ . Should take a smaller value, the 
effects of the selective punishment become more pronounced and adaptation is 
initially slower, but this effect vanishes as the network gains 'good' connections 
to all possible outputs. 

However, in the second of our two problems — switching back and forth between 
two distinct input-output maps — selective punishment proves a considerable 
beneflt (Fig. 4b). While for 6* = 1.0 = 6 the value of remains constant 
with a value of riipUop, as one would expect, values of 6* < 6 see a decrease 
in (01 ) with successive adaptations, towards a minimum value much lower 
than without the selective punishment. Recall that the selective punishment 

^ Different results are observed for different network topologies. For example, on a 
random network, selective punishment proves very effective at enabling the system 
to distinguish between those paths that go nowhere or terminate in endless loops, 
and those that actually lead to output neurons 0|. 
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Fig. 4. Adaptive ability with LTD only, but with selective punishment of previously 
successful connections, (a) Slow change problem. No significant difference is observed 
for different values of 6* . (b) Flip-flop problem. When lesser rates of punishment are 
applied to 'good' connections, decreases with successive adaptations: the system 
has a memory of previously good responses. Data averaged over 128 realizations. 

favours previously-good connections over others. Since in this case the number 
of good responses for each input is small by comparison to the total number 
of responses, this has the effect of drastically cutting learning times. 



6 Conclusions 



Perhaps the key result of the present work has been to observe that, in the 
setup considered here, strengthening of synapses always carries within it the 
potential for divergence between the strengths of active and inactive connec- 
tions. This divergence is governed not merely by the level of potentiation but 
also by the system size, increasing as the network becomes larger. 

The minibrain is a 'toy' model, but it is nevertheless instructive to consider it 
in the light of biological results. Both LTP and LTD are well-observed in bio- 
logical neuronal systems but their precise functions remain unclear jlll . . A 
variety of different points of view can be found in the literature, with a number 
of authors explicitly endorsing a selectionist picture of neuronal dynamics [l3l | 
where learning is by either elimination or depression of connections. Thinking 
along these lines one might want to seek other means of positive feedback than 



LTP, such as the 'synaptic forgiveness' proposed by Klemm et al. [14 



Other authors have suggested that learning may result from a balance of LTP 
and LTD with a global feedback mechanism to prevent runaway strengthening 
or weakening of synapses fl5] . A modification to the minibrain along these lines 
has recently been proposed by Bosman et al. 1^|. The present results suggest 



that such a global mechanism may not just be useful, but vital, if LTP is to 
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be an effective part of learning. 
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